San Francisco Crime and Topography

R GIS

Visitors and Residents of San Francisco are familiar with the hilly terrain and roads that climb into the clouds. But does this topography have an effect on the likelihood of car break-ins? This regression analysis seeks to provide insights into this question.

Alex Clippinger true
2021-11-02

A particularly steep San Francisco street

That is one steep street!

Research Question

My research question is: "What is the relationship between topography and car break-ins in San Francisco?

Background

This analysis will focus on the link between elevation and “hilliness” in determining the likelihood of car break-ins in San Francisco. Both terrain and motor vehicle crimes are ubiquitous when discussing living or visiting San Francisco. This relationship has been coined by the phrase - “Crime Doesn’t Climb”.

In April 2021, Young-An Kim & James C. Wo published Topography and crime in place: The effects of elevation, slope, and betweenness in San Francisco street segments. Their study provides a robust regression analysis on the effects of elevation, slope, and “hilliness” on crime, controlling for socio-demographic characteristics Kim and Wo (2021). My analysis will focus only on car break-ins rather than all crime reports, as I believe that these crimes will have an even more pronounced relationship with topography. Their analysis also delved deeper into street connectivity and the city’s road networks as factors of crime occurrence.

Data Collection

In order to collect data, we first need to identify our key variables.

Important Variables

When discussing topography, both elevation and “hilliness,” or slope, are necessary for inclusion. This is because it more accurately captures the effect of local level topography, which is supported by Kim & Wo.

In any econometric analysis, it is vital to control for socio-economic variables. In this case, it could be that higher elevations in the city are more affluent areas, which may have an impact on crime. Thus, we want to include median income as a control variable.

Here is the regression equation:

\[MotorVehicleTheft_i = \beta_0 + \beta_1Elevation_i + \beta_2Slope_i + \beta_3MedianIncome_i + u_i\]

Data Sources

Analysis Plan

The analysis plan steps to address this research question are as follows:

  1. Identify question*
  2. Select key variables (based on existing literature)*
  3. Collect data*
  4. Merge data
  5. Visualize simple relationships
  6. Test OLS assumptions
  7. Conduct regression analysis
  8. Interpret Results

Completed in previous sections*

Merge Data

Show code
# EPSG:7132 - NAD83(2011) / San Francisco CS13 (ftUS) - Projected
#crimes <- st_read("data/crime_reports_2018_present/crime_reports_2018_present.shp", query = "SELECT date_incid FROM crime_reports_2018_present WHERE incident_2 = 'Motor Vehicle Theft'") %>% 
#  st_transform(crs = st_crs(7132))

crimes <- st_read("data/crime_reports_2018_present/motor_vehicle_theft/motor_vehicle_theft_w_slope.shp", query = "SELECT date_incid, slope FROM motor_vehicle_theft_w_slope WHERE slope IS NOT NULL")
Reading query `SELECT date_incid, slope FROM motor_vehicle_theft_w_slope WHERE slope IS NOT NULL' from data source `C:\Users\clipp\Documents\UCSB\Website\alexclippinger.github.io\_posts\2021-11-02-sf-crime-and-hilliness\data\crime_reports_2018_present\motor_vehicle_theft\motor_vehicle_theft_w_slope.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 24454 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 139763.1 ymin: 63441.8 xmax: 181072.3 ymax: 99940.44
z_range:       zmin: 0 zmax: 68.33842
Projected CRS: NAD83(2011) / San Francisco CS13 (ftUS)
Show code
contours <- st_read("data/contours/contours.shp") %>% 
  st_transform(crs = st_crs(7132))
Reading layer `contours' from data source 
  `C:\Users\clipp\Documents\UCSB\Website\alexclippinger.github.io\_posts\2021-11-02-sf-crime-and-hilliness\data\contours\contours.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 13901 features and 4 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -122.5177 ymin: 37.69998 xmax: -122.357 ymax: 37.81103
Geodetic CRS:  WGS84(DD)
Show code
# developments on parcels with slopes of 20%+. Need to buffer to include streets
#slopes_over_20 <- st_read("data/slopes_over_20/buffered_slope_areas.shp", query = "SELECT objectid as slope_over_20 FROM buffered_slope_areas") %>% 
#  st_transform(crs = st_crs(7132)) 

# need to run census_api_key() first
census_geom <- get_acs(geography = 'tract',
                       variables = "B19013_001",
                       state = "CA",
                       county = "San Francisco",
                       geometry = TRUE) %>% 
  st_transform(crs = st_crs(7132))

  |                                                                  
  |                                                            |   0%
  |                                                                  
  |=                                                           |   1%
  |                                                                  
  |=                                                           |   2%
  |                                                                  
  |==                                                          |   3%
  |                                                                  
  |==                                                          |   4%
  |                                                                  
  |===                                                         |   4%
  |                                                                  
  |===                                                         |   5%
  |                                                                  
  |====                                                        |   6%
  |                                                                  
  |====                                                        |   7%
  |                                                                  
  |=====                                                       |   8%
  |                                                                  
  |=====                                                       |   9%
  |                                                                  
  |======                                                      |   9%
  |                                                                  
  |======                                                      |  10%
  |                                                                  
  |======                                                      |  11%
  |                                                                  
  |=======                                                     |  11%
  |                                                                  
  |=======                                                     |  12%
  |                                                                  
  |========                                                    |  13%
  |                                                                  
  |========                                                    |  14%
  |                                                                  
  |=========                                                   |  14%
  |                                                                  
  |=========                                                   |  15%
  |                                                                  
  |=========                                                   |  16%
  |                                                                  
  |==========                                                  |  16%
  |                                                                  
  |==========                                                  |  17%
  |                                                                  
  |===========                                                 |  18%
  |                                                                  
  |===========                                                 |  19%
  |                                                                  
  |============                                                |  19%
  |                                                                  
  |============                                                |  20%
  |                                                                  
  |=============                                               |  21%
  |                                                                  
  |=============                                               |  22%
  |                                                                  
  |==============                                              |  23%
  |                                                                  
  |==============                                              |  24%
  |                                                                  
  |===============                                             |  24%
  |                                                                  
  |===============                                             |  25%
  |                                                                  
  |================                                            |  26%
  |                                                                  
  |================                                            |  27%
  |                                                                  
  |=================                                           |  28%
  |                                                                  
  |=================                                           |  29%
  |                                                                  
  |==================                                          |  29%
  |                                                                  
  |==================                                          |  30%
  |                                                                  
  |===================                                         |  31%
  |                                                                  
  |===================                                         |  32%
  |                                                                  
  |====================                                        |  33%
  |                                                                  
  |====================                                        |  34%
  |                                                                  
  |=====================                                       |  34%
  |                                                                  
  |=====================                                       |  35%
  |                                                                  
  |=====================                                       |  36%
  |                                                                  
  |======================                                      |  36%
  |                                                                  
  |======================                                      |  37%
  |                                                                  
  |=======================                                     |  38%
  |                                                                  
  |=======================                                     |  39%
  |                                                                  
  |========================                                    |  39%
  |                                                                  
  |========================                                    |  40%
  |                                                                  
  |=========================                                   |  41%
  |                                                                  
  |=========================                                   |  42%
  |                                                                  
  |==========================                                  |  43%
  |                                                                  
  |==========================                                  |  44%
  |                                                                  
  |===========================                                 |  44%
  |                                                                  
  |===========================                                 |  45%
  |                                                                  
  |============================                                |  46%
  |                                                                  
  |============================                                |  47%
  |                                                                  
  |=============================                               |  48%
  |                                                                  
  |=============================                               |  49%
  |                                                                  
  |==============================                              |  49%
  |                                                                  
  |==============================                              |  50%
  |                                                                  
  |==============================                              |  51%
  |                                                                  
  |===============================                             |  51%
  |                                                                  
  |===============================                             |  52%
  |                                                                  
  |================================                            |  53%
  |                                                                  
  |=================================                           |  54%
  |                                                                  
  |=================================                           |  55%
  |                                                                  
  |==================================                          |  56%
  |                                                                  
  |==================================                          |  57%
  |                                                                  
  |===================================                         |  58%
  |                                                                  
  |===================================                         |  59%
  |                                                                  
  |====================================                        |  59%
  |                                                                  
  |====================================                        |  60%
  |                                                                  
  |=====================================                       |  61%
  |                                                                  
  |=====================================                       |  62%
  |                                                                  
  |======================================                      |  63%
  |                                                                  
  |======================================                      |  64%
  |                                                                  
  |=======================================                     |  64%
  |                                                                  
  |=======================================                     |  65%
  |                                                                  
  |========================================                    |  66%
  |                                                                  
  |========================================                    |  67%
  |                                                                  
  |=========================================                   |  68%
  |                                                                  
  |=========================================                   |  69%
  |                                                                  
  |==========================================                  |  69%
  |                                                                  
  |==========================================                  |  70%
  |                                                                  
  |===========================================                 |  71%
  |                                                                  
  |===========================================                 |  72%
  |                                                                  
  |============================================                |  73%
  |                                                                  
  |============================================                |  74%
  |                                                                  
  |=============================================               |  74%
  |                                                                  
  |=============================================               |  75%
  |                                                                  
  |=============================================               |  76%
  |                                                                  
  |==============================================              |  76%
  |                                                                  
  |==============================================              |  77%
  |                                                                  
  |===============================================             |  78%
  |                                                                  
  |================================================            |  79%
  |                                                                  
  |================================================            |  80%
  |                                                                  
  |================================================            |  81%
  |                                                                  
  |=================================================           |  81%
  |                                                                  
  |=================================================           |  82%
  |                                                                  
  |==================================================          |  83%
  |                                                                  
  |==================================================          |  84%
  |                                                                  
  |===================================================         |  84%
  |                                                                  
  |===================================================         |  85%
  |                                                                  
  |====================================================        |  86%
  |                                                                  
  |====================================================        |  87%
  |                                                                  
  |=====================================================       |  88%
  |                                                                  
  |=====================================================       |  89%
  |                                                                  
  |======================================================      |  89%
  |                                                                  
  |======================================================      |  90%
  |                                                                  
  |=======================================================     |  91%
  |                                                                  
  |=======================================================     |  92%
  |                                                                  
  |========================================================    |  93%
  |                                                                  
  |========================================================    |  94%
  |                                                                  
  |=========================================================   |  94%
  |                                                                  
  |=========================================================   |  95%
  |                                                                  
  |=========================================================   |  96%
  |                                                                  
  |==========================================================  |  96%
  |                                                                  
  |==========================================================  |  97%
  |                                                                  
  |=========================================================== |  98%
  |                                                                  
  |=========================================================== |  99%
  |                                                                  
  |============================================================|  99%
  |                                                                  
  |============================================================| 100%

The following code chunk demonstrates how the crime, elevation, slope, and income datasets were merged.

Show code
# Find index of nearest contour to each crime
elev <- st_nearest_feature(x = crimes, y = contours)

# Add elevation and binary slope columns
crimes <- crimes %>% 
  st_join(y = census_geom, join = st_within, left = TRUE) %>% 
  mutate(elev = contours[elev,]$elevation) %>% 
  rename(median_income = estimate) %>% 
  select(date_incid, slope, median_income, elev, geometry)

Visualize simple relationships

The following plots show the simple relationships between count of car break ins and the three independent variables (elevation, slope, and median income).

Show code
# Group by income only
income_summary <- crimes %>% 
  st_drop_geometry() %>% 
  group_by(median_income) %>% 
  summarize(count = n())

income_plot = ggplot(data = income_summary, aes(x = median_income, y = count)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic() +
  labs(title = "Crime and Median Income",
       x = "Median Income (USD)",
       y = "Number of Break-Ins")

# Group by elevation
elev_summary <- crimes %>% 
  st_drop_geometry() %>% 
  group_by(elev) %>% 
  summarize(count = n())

elev_plot <- ggplot(data = elev_summary, aes(x = elev, y = count)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic() +
  labs(title = "Crime and Elevation",
       x = "Elevation (feet)",
       y = "Number of Break-Ins")

# Group by slope
slope_summary <- crimes %>% 
  st_drop_geometry() %>% 
  group_by(slope) %>% 
  summarize(count = n())

slope_plot <- ggplot(data = slope_summary, aes(x = slope, y = count)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  theme_classic() +
  labs(title = "Crime and Slope",
       x = "Slope (percent)",
       y = "Number of Break-Ins")

elev_plot + (slope_plot / income_plot)

We see there is a negative correlation between elevation and crime. It is possible that this relationship is not linear, so this variable may fit better if we take the natural log. Since slope in this analysis is a binary variable, we simply see that there are more crimes in areas that are not designated high slope. Lastly, we observe a weak negative correlation between median income and car break-ins.

The following plots show crimes over the time period of our data.

Show code
# Group by all three variables
crimes_summary <- crimes %>% 
  st_drop_geometry() %>% 
  group_by(slope, median_income, elev) %>% 
  summarize(count = n())

# Group by date
crimes_ts <- crimes %>% 
  st_drop_geometry() %>% 
  group_by(date_incid) %>% 
  summarize(count = n())

ts_plot <- ggplot(data = crimes_ts, aes(x = date_incid, y = count)) +
  geom_point() +
  geom_smooth(method = "lm") +
  theme_classic() +
  labs(title = "Daily Crime 2018-Present",
       x = "Date (daily)",
       y = "Number of Break-Ins")

crimes_monthly <- crimes_ts %>% 
  mutate(month = lubridate::floor_date(date_incid, "month")) %>% 
  group_by(month) %>% 
  summarize(monthly_sum = sum(count)) %>% 
  filter(month != "2021-11-01")

monthly_plot <- ggplot(data = crimes_monthly, aes(x = month, y = monthly_sum)) +
  geom_point() +
  geom_line() + 
  #geom_smooth(method = "lm") +
  theme_classic() +
  labs(title = "Monthly Crime 2018-Present",
       x = "Date (monthly)",
       y = "Number of Break-Ins")

ts_plot + monthly_plot

We can see that there is an upward trend to our crime data over the time period of our data.

Show code
bbox <- st_bbox(census_geom)

library(tmap)
tmap_mode("view")
tm_shape(crimes)+
  tm_dots("elev")
Show code
#elev_map <- ggplot(data = crimes) +
#  geom_sf(aes(color = elev)) +
#  theme_minimal() +
#  theme(axis.text = element_blank(),
#        legend.position = "bottom",
#        legend.title = element_blank()) +
#  labs(title = "Crime by Elevation")

#slope_map <- ggplot(data = crimes) +
#  geom_sf(aes(color = slope)) +
#  theme_minimal() +
#  theme(axis.text = element_blank(),
#        legend.position = "bottom",
#        legend.title = element_blank()) +
#  labs(title = "Crime by Slope")

#income_map <- ggplot(data = crimes) +
#  geom_sf(aes(color = median_income)) +
#  theme_minimal() +
#  theme(axis.text = element_blank(),
#        legend.position = "bottom",
#        legend.title = element_blank()) +
#  labs(title = "Crime by Median Income")

#elev_map #+ (slope_map / income_map)

Regression Analysis

Show code
model <- lm(formula = count ~ elev + slope + median_income, data = crimes_summary)

#model %>% 
#  summary() %>% 
#  xtable() %>% 
#  kable()

summ(model) 
Observations 4391 (23 missing obs. deleted)
Dependent variable count
Type OLS linear regression
F(3,4387) 151.47
0.09
Adj. R² 0.09
Est. S.E. t val. p
(Intercept) 10.15 0.36 28.27 0.00
elev -0.01 0.00 -14.89 0.00
slope -0.15 0.02 -6.88 0.00
median_income -0.00 0.00 -3.43 0.00
Standard errors: OLS

Transformation

Show code
model_transformed <- lm(formula = log(count) ~ elev + slope + median_income, data = crimes_summary)

summ(model_transformed)
Observations 4391 (23 missing obs. deleted)
Dependent variable log(count)
Type OLS linear regression
F(3,4387) 247.60
0.14
Adj. R² 0.14
Est. S.E. t val. p
(Intercept) 1.88 0.04 48.15 0.00
elev -0.00 0.00 -18.72 0.00
slope -0.02 0.00 -9.09 0.00
median_income -0.00 0.00 -4.69 0.00
Standard errors: OLS

The coefficients above can be interpreted as the effect of the independent variables on car break-ins.

Test OLS Assumptions

Recall our four key assumptions for OLS:

  1. The population relationship is linear in parameters with an additive disturbance.

  2. Our \(X\) variables are exogenous, i.e., \(\mathop{\boldsymbol{E}}\left[ u \mid X \right] = 0\).

  3. The \(X\) variables have variation.

  4. The population disturbances \(u_i\) are independently and identically distributed as normal random variables with mean zero \(\left( \mathop{\boldsymbol{E}}\left[ u \right] = 0 \right)\) and variance \(\sigma^2\) (i.e., \(\mathop{\boldsymbol{E}}\left[ u^2 \right] = \sigma^2\))

We assume that assumptions 1 and 2 hold. Assumption 3 holds because we see that the \(x\) variables have variation in the plots above. Below, the residuals are generated from the main regression and are used to assess the three components of assumption #4.

Show code
crime_lm <- crimes_summary %>% 
  add_predictions(model = lm(count ~ elev + slope + median_income, data = crimes_summary)) %>% 
  mutate(res = count - pred)

ggplot(data = crime_lm, aes(x = res)) +
  geom_histogram(binwidth = 5, fill = "darkblue", col = "black") + 
  labs(title = "Residual Plot",
       x = "Residuals",
       y = "Count") +
  theme_classic()

The residuals do not appear to be normally distributed. There is a long right tail with high outliers. This indicates that the regression analysis is systematically under-predicting counts of car break-ins.

Show code
ggplot(data = crime_lm, aes(sample = res)) +
  geom_qq() +
  geom_qq_line() +
  theme_classic() +
  labs(title = "QQ Plot",
       x = "Theoretical Normal Distribution",
       y = "Analysis Distribution")

The qq plot shows that the distribution is mostly normally distributed for values up to approximately 2. This suggests that the model is not normally distributed for extreme high values.

Show code
x <- ggplot(data = crime_lm, aes(x = count, y = res)) +
  geom_point() + theme_classic() +
  labs(title = "Residuals vs Dependent (Break-Ins)",
       x = "Number of Break-Ins",
       y = "Residual Error")

y1 <- ggplot(data = crime_lm, aes(x = elev, y = res)) +
  geom_point() + theme_classic() + 
  labs(title = "Residuals vs. Elevation",
       x = "Elevation (feet)",
       y = "Residual Error")

y2 <- ggplot(data = crime_lm, aes(x = slope, y = res)) +
  geom_point() + theme_classic() + 
  labs(title = "Residuals vs. Slope",
       x = "Slope (percent)",
       y = "Residual Error")

y3 <- ggplot(data = crime_lm, aes(x = median_income, y = res)) +
  geom_point() + theme_classic() + 
  labs(title = "Residuals vs. Income",
       x = "Median Income",
       y = "Residual Error")

(x + y1) / (y2 + y3)

Since assumptions 1, 2, and 3 appear to be satisfied, using OLS in this case is an unbiased estimator. However, residual plots indicate that assumption 4 may not be satisfied and therefore OLS may not be the estimator with lowest variance. Thus, an alternative estimator to OLS may provide estimates of the true population parameters with less variance.

Results

Kim, Young-An, and James C. Wo. 2021. “Topography and Crime in Place: The Effects of Elevation, Slope, and Betweenness in San Francisco Street Segments.” Journal of Urban Affairs 0 (0): 1–25. https://doi.org/10.1080/07352166.2021.1901591.

References